Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

October 2017 version of Kaldi #8

Open
wants to merge 2,437 commits into
base: qut-version
Choose a base branch
from
Open

Conversation

himaivan
Copy link

No description provided.

Xuechen Liu and others added 29 commits May 1, 2021 23:36
* Make BatchedThreadedNnet3CudaOnlinePipeline::CorrelationID
  public (fallout from #4490).
* Replace Win32-specific Sleep with portable C++ std::chrono
  classes in src/base/kaldi-utils.*. Change `Sleep(float)` arg
  type to `double` because this is what these classes take and
  return when converting to/from time in seconds.
* Use C++ constants for timers instead of sesquipedalian macros.
* Move includes under `HAVE_CUDA` conditionals in files without
  code that would make no sense without CUDA.
* Issue an #error if attempting to compile CUDA-dependent code
  with HAVE_CUDA not set.
* Add, reword some and reformat commentary and help strings in
  cudadecoderbin/batched-wav-nnet3-cuda-online.cc

Accidental touch-paint fixes:
* Replace usleep() with kaldi::Sleep in util/kaldi-table-test.cc.
* Fix escape non-sequence '\%' where '%' was intended in strings
  in src/nnet3/nnet-utils.cc.
* KALDI_ASSERT enough space in a matrix column before
  memcpy()'ing into it.
* [chore] rearrange include guard to make including the
  file invariant w.r.t. HAVE_CUDA defined, and rename
  the guard preprocessor var to the coding conventions.
* Add two static member functions to CuDevice which may be
  called before Initialize() to set boolean options that
  currently settable only from config file:
  - EnableTensorCores()
  - EnableTf32Compute()
Chores:
* Remove unneeded inline keywords and mark const functions const.
* Rearrange includes, fix spacing, wrapping and commentary
  to the coding standard, and for Doxygen.
* Remove extra vertical and all EOL whitespace.
Change all uses of path variables to consistently use CMAKE_CURRENT_*
variables. Previously a combination of CMAKE_* and CMAKE_CURRENT_*
variables was used, which lead to error when Kaldi was added to another
cmake project using add_subdirectory.
)

Also add several defensive CU_SAFE_CALL guards that weren't
there before.

Co-authored-by: Daniel Galvez <[email protected]>
Before C++11 we declared the same members private, usually at the
very end of a class declaration. With standardization of the delete
keyword and adoption of the pattern, it is now idiomatic to make
deleted members public. The rationale is that deletion of copy members
is part of the public behavior of a type, namely its uncopyability,
and should be expressed in and readable from the public: section alone.

Google coding style, which we are generally following, does not
mention the 'DISALLOW_COPY_AND_ASSIGN' macro any more, and recommends
using the '= delete;' construct directly. From
https://google.github.io/styleguide/cppguide.html#Copyable_Movable_Types:

> Every class's public interface must make clear which copy and move
> operations the class supports. This should usually take the form of
> explicitly declaring and/or deleting the appropriate operations in
> the public section of the declaration.

Examples from the guide use the delete keyword straight without any
macro, but I think we should make an exception and continue using
the macro, which is more self-explanatory.

clang-tidy gets angry at non-public deleted members, too.
The file had not been used in any recipe.
Use 'steps/nnet3/align_lats.sh' instead (no 'chain').
For minor differences see #4549

Thanks @vesis84 for discovering and @desh2608 for analyzing the dupe!

X-Ref: #4543
Close: #4549
1. Order SUBDIRS alphabetically.
2. Obtain MEMTESTDIRS by set subtraction: there are only 3 members
   of SUBDIRS not in it.
3. Add a phony 'libs' target to build libs only. Just because.
4. Abbreviate a common dependency 'BMU = base matrix util', and
   order all listed dependencies of each target alphabetically
   (e.g., 'decoder: $(BMU) fstext gmm hmm lat transform tree').
5. Use true and false as values for the variable 'with_cudadecoder'
   in ./configure, as all other variables do. Change Makefiles in
   related subdirs to use the new value.
6. Bump 'CONFIGURE_VERSION' to 14 because of this.
7. Never set 'WITH_CUDADECODER = true' in kaldi.mk if CUDA is false
   or undefined. The previous default disregarded the detection of
   CUDA, the root cause of issue #4544.
8. Declare the 'depend' target phony.
9. Demote sgmm2 and sgmm2bin to EXTRA_SUBDIRS.
* Unpin memory of h_all_waveform_, deallocated but left pinned upon
  instance destruction.
* In the same added destructor, invoke WaitForLatticeCallbacks()
  to prevent a short race between the destroying of this instance and
  completion of callback threads, which decrement the pending callback
  count _after_ the callback has returned and deleted.
* Assert the quiescent object state the destructor.
BatchedThreadedNnet3CudaOnlinePipeline never freed a CudaFst object
it initialized. Freeing it caused an error returned from cudaFree
(with our allocator disabled).

The root cause of this problem was copying of the CudaFst in
CudaDecoder's constructor. The code has been changed so that
CudaDecoder only stores a reference to CudaFst, and the object
itself is uniquely owned by the pipeline.

This allowed folding of separate CudaFst::Initialize and
CudaFst::Finalize methods into its respective constructor and
destructor.

There is also a proof-of-concept unique_device_ptr, a specialization
of std::unique_ptr for device-allocated memory. The type is declared
privately in, and only used by CudaFst. The Finalize method had been
removed entirely, and the memory is freed in the object's destructor,
as the members are destroyed.
When python3 and numpy are available, verify that the generated
file is successfully loaded by numpy without exceptions by
invoking 'python3 -c ...' via a system() call.

Don't leave behind the temporary file. If python3 and numpy
aren't both available, print a warning and kill the file.
* Took heed from static analysis about an unpaired delete.
* Remove an unused data member and a GetPathName overload.
* Move the remaining used GetPathName() helper into the .cc
  file, as it is a static helper function which doesn't
  semantically belong in the class.
* Adjust formatting to the coding style (not exhaustively).
Remove 'template<bool B> class KaldiCompileTimeAssert' because
'static_assert' is now a language declaration. Before C++17 a
second argument for the message is required (lame, lame!), which
is, lacking anything better, is obtained by stringizing the
function-like macro argument. For C++17, the second argument is
optional, and the semantics of 'KALDI_COMPILE_TIME_ASSERT' and
'static_assert' is identical when the second argument to the
latter is not provided.
* Initialize CuDevice::curand_handle_ to NULL in constructor.
* Remove stray include guards from cu-common.cc.
* Rearrange #include's per coding guidelines.
Also reformat string literals that ended up chopped into too many
pieces over too many lines, and add commentary on the importance
of data member ordering within the class.
Other minor tweaks:
 * Convert runtime asserts to static asserts in template.
 * Remove ThreadPoolLight::nworkers_ as it's equal to the
   .size() of the worker vector, and was used only once.
 * Replace pointer indirection with std::move and make
   ThreadPoolLightWorker::thread_ a direct class member.
 * Update to coding guidelines and IWYU.
…4569)

 * Wait for D2H then H2H transfers to dry out before stopping the pump.
 * Assert no H2H tasks after it stops.
 * Assert that an offset into vector is within bounds (in the #4556 scenario, it's not).
 * Add CU_SAFE_CALL to every CUDA call.
 * Use compile time asserts on constants instead of runtime asserts
 * The rest is reformatting comments, IWYU and updates to coding conventions.
danpovey and others added 30 commits June 3, 2024 11:45
Fix missing FLT_MAX in some CUDA installation scenarios.
Fix reported issues w.r.t python2.7 and some apple silicone quirks
Support for both OpenFST 1.7.3 and 1.8.2
* upload the error logs to artifact repository

* fix yaml error
* upgrade the checkout action on build pipeline

* upgrade the checkout action on build pipeline
Make it buildable under Fedora41 using their board tools
* Support OpenBLAS with WASM

* Update README

* Force g_num_threads = 0 in WASM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.